2025-05-12-20-00
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
Abstract
arXiv:2505.05622v1 Announce Type: cross Abstract: Aerial vision-and-language navigation (VLN), requiring drones to interpret natural language instructions and navigate complex urban environments, emerges as a critical embodied AI challenge that bridges human-robot interaction, 3D spatial reasoning, and real-world deployment. Although existing ground VLN agents achieved notable results in indoor and outdoor settings, they struggle in aerial VLN due to the absence of predefined navigation graphs and the exponentially expanding action space in long-horizon exploration. In this work, we propose \textbf{CityNavAgent}, a large language model (LLM)-empowered agent that significantly reduces the navigation complexity for urban aerial VLN. Specifically, we design a hierarchical semantic planning module (HSPM) that decomposes the long-horizon task into sub-goals with different semantic levels. The agent reaches the target progressively by achieving sub-goals with different capacities of the LLM. Additionally, a global memory module storing historical trajectories into a topological graph is developed to simplify navigation for visited targets. Extensive benchmark experiments show that our method achieves state-of-the-art performance with significant improvement. Further experiments demonstrate the effectiveness of different modules of CityNavAgent for aerial VLN in continuous city environments. The code is available at \href{https://github.com/VinceOuti/CityNavAgent}{link}.
摘要
空中视觉与语言导航(VLN)要求无人机能够理解自然语言指令并在复杂的城市环境中自主导航,这一任务作为连接人机交互、三维空间推理与现实世界部署的关键具身智能挑战而崭露头角。尽管现有地面VLN智能体在室内外场景中取得了显著成果,但由于缺乏预定义导航图以及长时探索中呈指数级扩张的动作空间,它们在空中VLN任务中表现欠佳。本研究提出\textbf{CityNavAgent}——一个由大语言模型(LLM)驱动的智能体,可显著降低城市空中VLN的导航复杂度。具体而言,我们设计了分层语义规划模块(HSPM),将长时任务分解为不同语义层级的子目标。智能体通过利用LLM的不同能力逐步完成子目标,最终抵达终点。此外,开发了全局记忆模块,将历史轨迹存储为拓扑图以简化已访问目标的导航。大量基准实验表明,本方法以显著优势实现了最先进的性能。进一步实验验证了CityNavAgent各模块在连续城市环境中进行空中VLN的有效性。代码发布于\href{https://github.com/VinceOuti/CityNavAgent}{链接}。